AITopics | hierarchical rl

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 00:16:49 GMT

Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems. Our approach learns an instruction-following low-level policy and a high-level policy that can reuse abstractions across tasks, in essence, permitting agents to reason using structured language. To study compositional task learning, we introduce an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine. We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations. Our analysis find that the compositional nature of language is critical for learning and systematically generalizing sub-skills in comparison to non-compositional abstractions that use the same supervision.

abstraction, hierarchical deep reinforcement learning, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Neural Information Processing SystemsOct-9-2024, 12:20:18 GMT

Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems.

artificial intelligence, hierarchical deep reinforcement learning, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Jiang, YiDing, Gu, Shixiang (Shane), Murphy, Kevin P., Finn, Chelsea

Neural Information Processing SystemsMar-19-2020, 00:30:21 GMT

Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems.

abstraction, hierarchical deep reinforcement learning, hierarchical rl, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Nachum, Ofir, Gu, Shixiang, Lee, Honglak, Levine, Sergey

arXiv.org Artificial IntelligenceOct-2-2018

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning. In such hierarchical structures, a higher-level controller solves tasks by iteratively communicating goals which a lower-level policy is trained to reach. Accordingly, the choice of representation -- the mapping of observation space to goal space -- is crucial. To study this problem, we develop a notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation. We derive expressions which bound the sub-optimality and show how these expressions can be translated to representation learning objectives which may be optimized in practice. Results on a number of difficult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods (see videos at https://sites.google.com/view/representation-hrl).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1810.01257

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Diversity is All You Need: Learning Skills without a Reward Function

Eysenbach, Benjamin, Gupta, Abhishek, Ibarz, Julian, Levine, Sergey

arXiv.org Artificial IntelligenceJun-6-2018

Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN ("Diversity is All You Need"), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. We show how pretrained skills can provide a good parameter initialization for downstream tasks, and can be composed hierarchically to solve complex, sparse reward tasks. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1802.0607

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education > Focused Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hierarchical Policy Search via Return-Weighted Density Estimation

Osa, Takayuki (University of Tokyo / RIKEN) | Sugiyama, Masashi (RIKEN / University of Tokyo)

AAAI ConferencesFeb-8-2018

Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcement learning (RL). Hierarchical RL (HRL) tackles this problem by learning a hierarchicalpolicy, where multiple option policies are in charge of different strategies corresponding to modes of a reward function and a gating policy selects the best option for a given context. Although HRL has been demonstrated to be promising, current state-of-the-art methods cannot still perform well in complex real-world problems due to the difficulty of identifying modes of the reward function. In this paper, we propose a novel method called hierarchical policy search via return-weighted density estimation (HPSDE), which can efficiently identify the modes through density estimation with return-weighted importance sampling. Our proposed method finds option policies corresponding to the modes of the return function and automatically determines the number and the location of option policies, which significantly reduces the burden of hyper-parameters tuning. Through experiments, we demonstrate that the proposed HPSDE successfully learns option policies corresponding to modes of the return function and that it can be successfully applied to a motion planning problem of a redundant robotic manipulator.

hpsde, option policy, return function, (15 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: